{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Analyze a large dataset with Google BigQuery\n",
    "\n",
    "**Learning Objectives**\n",
    "\n",
    "1. Access an ecommerce dataset\n",
    "1. Look at the dataset metadata\n",
    "1. Remove duplicate entries\n",
    "1. Write and execute queries\n",
    "\n",
    "\n",
    "## Introduction \n",
    "BigQuery is Google's fully managed, NoOps, low cost analytics database. With BigQuery you can query terabytes and terabytes of data without having any infrastructure to manage or needing a database administrator. BigQuery uses SQL and can take advantage of the pay-as-you-go model. BigQuery allows you to focus on analyzing data to find meaningful insights.\n",
    "\n",
    "We have a publicly available ecommerce dataset that has millions of Google Analytics records for the Google Merchandise Store loaded into a table in BigQuery. In this lab, you use a copy of that dataset. Sample scenarios are provided, from which you look at the data and ways to remove duplicate information. The lab then steps you through further analysis the data.\n",
    "\n",
    "BigQuery can be accessed by its own browser-based interface, Google Data Studio, and many third party tools.  In this lab you will use the BigQuery directly in notebook cells using the iPython magic command `%%bigquery`.\n",
    "\n",
    "The steps you will follow in the lab are analogous to what you would do to prepare data for use in advanced ML operations. You will follow the notebook to experiment with the BigQuery queries provided to analyze the data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Set up the notebook environment\n",
    "\n",
    "__VERY IMPORTANT__:  In the cell below you must replace the text `<YOUR PROJECT>` with you GCP project id."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "\n",
    "import pandas as pd\n",
    "\n",
    "PROJECT = \"<YOUR PROJECT>\" #TODO Replace with your project id\n",
    "\n",
    "os.environ[\"PROJECT\"] = PROJECT\n",
    "\n",
    "pd.options.display.max_columns = 50"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Explore eCommerce data and identify duplicate records\n",
    "\n",
    "Scenario: You were provided with Google Analytics logs for an eCommerce website in a BigQuery dataset.  The data analyst team created a new BigQuery table of all the raw eCommerce visitor session data.  This data tracks user interactions, location, device types, time on page, and details of any transaction.  Your ultimate plan is to use this data in an ML capacity to create a model that delivers highly accurate predictions of user behavior to support tailored marketing campaigns.\n",
    "\n",
    "First, a few notes on BigQuery within a python notebook context.  Any cell that starts with `%%bigquery` (the BigQuery Magic) will be interpreted as a SQL query that is executed on BigQuery, and the result is printed to our notebook.\n",
    "\n",
    "BigQuery supports [two flavors](https://cloud.google.com/bigquery/docs/reference/standard-sql/migrating-from-legacy-sql#comparison_of_legacy_and_standard_sql) of SQL syntax: legacy SQL and standard SQL. The preferred is standard SQL because it complies with the official SQL:2011 standard. To instruct BigQuery to interpret our syntax as such we start the query with `#standardSQL`.\n",
    "\n",
    "Our first query is accessing the BigQuery Information Schema which stores all object-related metadata.  In this case we want to see metadata details for the \"all_sessions_raw\" table. \n",
    "\n",
    "Tip: To run the current cell you can click the cell and hit **shift enter**"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "TODO 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>table_name</th>\n",
       "      <th>column_name</th>\n",
       "      <th>ordinal_position</th>\n",
       "      <th>is_nullable</th>\n",
       "      <th>data_type</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>all_sessions_raw</td>\n",
       "      <td>fullVisitorId</td>\n",
       "      <td>1</td>\n",
       "      <td>YES</td>\n",
       "      <td>STRING</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>all_sessions_raw</td>\n",
       "      <td>channelGrouping</td>\n",
       "      <td>2</td>\n",
       "      <td>YES</td>\n",
       "      <td>STRING</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>all_sessions_raw</td>\n",
       "      <td>time</td>\n",
       "      <td>3</td>\n",
       "      <td>YES</td>\n",
       "      <td>INT64</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>all_sessions_raw</td>\n",
       "      <td>country</td>\n",
       "      <td>4</td>\n",
       "      <td>YES</td>\n",
       "      <td>STRING</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>all_sessions_raw</td>\n",
       "      <td>city</td>\n",
       "      <td>5</td>\n",
       "      <td>YES</td>\n",
       "      <td>STRING</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>5</td>\n",
       "      <td>all_sessions_raw</td>\n",
       "      <td>totalTransactionRevenue</td>\n",
       "      <td>6</td>\n",
       "      <td>YES</td>\n",
       "      <td>INT64</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>6</td>\n",
       "      <td>all_sessions_raw</td>\n",
       "      <td>transactions</td>\n",
       "      <td>7</td>\n",
       "      <td>YES</td>\n",
       "      <td>INT64</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>7</td>\n",
       "      <td>all_sessions_raw</td>\n",
       "      <td>timeOnSite</td>\n",
       "      <td>8</td>\n",
       "      <td>YES</td>\n",
       "      <td>INT64</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>8</td>\n",
       "      <td>all_sessions_raw</td>\n",
       "      <td>pageviews</td>\n",
       "      <td>9</td>\n",
       "      <td>YES</td>\n",
       "      <td>INT64</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>9</td>\n",
       "      <td>all_sessions_raw</td>\n",
       "      <td>sessionQualityDim</td>\n",
       "      <td>10</td>\n",
       "      <td>YES</td>\n",
       "      <td>INT64</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>10</td>\n",
       "      <td>all_sessions_raw</td>\n",
       "      <td>date</td>\n",
       "      <td>11</td>\n",
       "      <td>YES</td>\n",
       "      <td>STRING</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>11</td>\n",
       "      <td>all_sessions_raw</td>\n",
       "      <td>visitId</td>\n",
       "      <td>12</td>\n",
       "      <td>YES</td>\n",
       "      <td>INT64</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>12</td>\n",
       "      <td>all_sessions_raw</td>\n",
       "      <td>type</td>\n",
       "      <td>13</td>\n",
       "      <td>YES</td>\n",
       "      <td>STRING</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>13</td>\n",
       "      <td>all_sessions_raw</td>\n",
       "      <td>productRefundAmount</td>\n",
       "      <td>14</td>\n",
       "      <td>YES</td>\n",
       "      <td>INT64</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>14</td>\n",
       "      <td>all_sessions_raw</td>\n",
       "      <td>productQuantity</td>\n",
       "      <td>15</td>\n",
       "      <td>YES</td>\n",
       "      <td>INT64</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>15</td>\n",
       "      <td>all_sessions_raw</td>\n",
       "      <td>productPrice</td>\n",
       "      <td>16</td>\n",
       "      <td>YES</td>\n",
       "      <td>INT64</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>16</td>\n",
       "      <td>all_sessions_raw</td>\n",
       "      <td>productRevenue</td>\n",
       "      <td>17</td>\n",
       "      <td>YES</td>\n",
       "      <td>INT64</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>17</td>\n",
       "      <td>all_sessions_raw</td>\n",
       "      <td>productSKU</td>\n",
       "      <td>18</td>\n",
       "      <td>YES</td>\n",
       "      <td>STRING</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>18</td>\n",
       "      <td>all_sessions_raw</td>\n",
       "      <td>v2ProductName</td>\n",
       "      <td>19</td>\n",
       "      <td>YES</td>\n",
       "      <td>STRING</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>19</td>\n",
       "      <td>all_sessions_raw</td>\n",
       "      <td>v2ProductCategory</td>\n",
       "      <td>20</td>\n",
       "      <td>YES</td>\n",
       "      <td>STRING</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>20</td>\n",
       "      <td>all_sessions_raw</td>\n",
       "      <td>productVariant</td>\n",
       "      <td>21</td>\n",
       "      <td>YES</td>\n",
       "      <td>STRING</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>21</td>\n",
       "      <td>all_sessions_raw</td>\n",
       "      <td>currencyCode</td>\n",
       "      <td>22</td>\n",
       "      <td>YES</td>\n",
       "      <td>STRING</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>22</td>\n",
       "      <td>all_sessions_raw</td>\n",
       "      <td>itemQuantity</td>\n",
       "      <td>23</td>\n",
       "      <td>YES</td>\n",
       "      <td>INT64</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>23</td>\n",
       "      <td>all_sessions_raw</td>\n",
       "      <td>itemRevenue</td>\n",
       "      <td>24</td>\n",
       "      <td>YES</td>\n",
       "      <td>INT64</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>24</td>\n",
       "      <td>all_sessions_raw</td>\n",
       "      <td>transactionRevenue</td>\n",
       "      <td>25</td>\n",
       "      <td>YES</td>\n",
       "      <td>INT64</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>25</td>\n",
       "      <td>all_sessions_raw</td>\n",
       "      <td>transactionId</td>\n",
       "      <td>26</td>\n",
       "      <td>YES</td>\n",
       "      <td>STRING</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>26</td>\n",
       "      <td>all_sessions_raw</td>\n",
       "      <td>pageTitle</td>\n",
       "      <td>27</td>\n",
       "      <td>YES</td>\n",
       "      <td>STRING</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>27</td>\n",
       "      <td>all_sessions_raw</td>\n",
       "      <td>searchKeyword</td>\n",
       "      <td>28</td>\n",
       "      <td>YES</td>\n",
       "      <td>STRING</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>28</td>\n",
       "      <td>all_sessions_raw</td>\n",
       "      <td>pagePathLevel1</td>\n",
       "      <td>29</td>\n",
       "      <td>YES</td>\n",
       "      <td>STRING</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>29</td>\n",
       "      <td>all_sessions_raw</td>\n",
       "      <td>eCommerceAction_type</td>\n",
       "      <td>30</td>\n",
       "      <td>YES</td>\n",
       "      <td>STRING</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>30</td>\n",
       "      <td>all_sessions_raw</td>\n",
       "      <td>eCommerceAction_step</td>\n",
       "      <td>31</td>\n",
       "      <td>YES</td>\n",
       "      <td>INT64</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>31</td>\n",
       "      <td>all_sessions_raw</td>\n",
       "      <td>eCommerceAction_option</td>\n",
       "      <td>32</td>\n",
       "      <td>YES</td>\n",
       "      <td>STRING</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "          table_name              column_name  ordinal_position is_nullable  \\\n",
       "0   all_sessions_raw            fullVisitorId                 1         YES   \n",
       "1   all_sessions_raw          channelGrouping                 2         YES   \n",
       "2   all_sessions_raw                     time                 3         YES   \n",
       "3   all_sessions_raw                  country                 4         YES   \n",
       "4   all_sessions_raw                     city                 5         YES   \n",
       "5   all_sessions_raw  totalTransactionRevenue                 6         YES   \n",
       "6   all_sessions_raw             transactions                 7         YES   \n",
       "7   all_sessions_raw               timeOnSite                 8         YES   \n",
       "8   all_sessions_raw                pageviews                 9         YES   \n",
       "9   all_sessions_raw        sessionQualityDim                10         YES   \n",
       "10  all_sessions_raw                     date                11         YES   \n",
       "11  all_sessions_raw                  visitId                12         YES   \n",
       "12  all_sessions_raw                     type                13         YES   \n",
       "13  all_sessions_raw      productRefundAmount                14         YES   \n",
       "14  all_sessions_raw          productQuantity                15         YES   \n",
       "15  all_sessions_raw             productPrice                16         YES   \n",
       "16  all_sessions_raw           productRevenue                17         YES   \n",
       "17  all_sessions_raw               productSKU                18         YES   \n",
       "18  all_sessions_raw            v2ProductName                19         YES   \n",
       "19  all_sessions_raw        v2ProductCategory                20         YES   \n",
       "20  all_sessions_raw           productVariant                21         YES   \n",
       "21  all_sessions_raw             currencyCode                22         YES   \n",
       "22  all_sessions_raw             itemQuantity                23         YES   \n",
       "23  all_sessions_raw              itemRevenue                24         YES   \n",
       "24  all_sessions_raw       transactionRevenue                25         YES   \n",
       "25  all_sessions_raw            transactionId                26         YES   \n",
       "26  all_sessions_raw                pageTitle                27         YES   \n",
       "27  all_sessions_raw            searchKeyword                28         YES   \n",
       "28  all_sessions_raw           pagePathLevel1                29         YES   \n",
       "29  all_sessions_raw     eCommerceAction_type                30         YES   \n",
       "30  all_sessions_raw     eCommerceAction_step                31         YES   \n",
       "31  all_sessions_raw   eCommerceAction_option                32         YES   \n",
       "\n",
       "   data_type  \n",
       "0     STRING  \n",
       "1     STRING  \n",
       "2      INT64  \n",
       "3     STRING  \n",
       "4     STRING  \n",
       "5      INT64  \n",
       "6      INT64  \n",
       "7      INT64  \n",
       "8      INT64  \n",
       "9      INT64  \n",
       "10    STRING  \n",
       "11     INT64  \n",
       "12    STRING  \n",
       "13     INT64  \n",
       "14     INT64  \n",
       "15     INT64  \n",
       "16     INT64  \n",
       "17    STRING  \n",
       "18    STRING  \n",
       "19    STRING  \n",
       "20    STRING  \n",
       "21    STRING  \n",
       "22     INT64  \n",
       "23     INT64  \n",
       "24     INT64  \n",
       "25    STRING  \n",
       "26    STRING  \n",
       "27    STRING  \n",
       "28    STRING  \n",
       "29    STRING  \n",
       "30     INT64  \n",
       "31    STRING  "
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%bigquery --project $PROJECT\n",
    "#standardsql\n",
    "SELECT * \n",
    "EXCEPT \n",
    "      (table_catalog, table_schema, is_generated, generation_expression, is_stored, \n",
    "      is_updatable, is_hidden, is_system_defined, is_partitioning_column, clustering_ordinal_position)\n",
    "FROM  `data-to-insights.ecommerce.INFORMATION_SCHEMA.COLUMNS`\n",
    "WHERE table_name=\"all_sessions_raw\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next examine how many rows are in the table."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "TODO 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>f0_</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>21552195</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "        f0_\n",
       "0  21552195"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%bigquery --project $PROJECT \n",
    "#standardSQL\n",
    "SELECT count(*)\n",
    "FROM `data-to-insights.ecommerce.all_sessions_raw`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now take a quick at few rows of data in the table."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>fullVisitorId</th>\n",
       "      <th>channelGrouping</th>\n",
       "      <th>time</th>\n",
       "      <th>country</th>\n",
       "      <th>city</th>\n",
       "      <th>totalTransactionRevenue</th>\n",
       "      <th>transactions</th>\n",
       "      <th>timeOnSite</th>\n",
       "      <th>pageviews</th>\n",
       "      <th>sessionQualityDim</th>\n",
       "      <th>date</th>\n",
       "      <th>visitId</th>\n",
       "      <th>type</th>\n",
       "      <th>productRefundAmount</th>\n",
       "      <th>productQuantity</th>\n",
       "      <th>productPrice</th>\n",
       "      <th>productRevenue</th>\n",
       "      <th>productSKU</th>\n",
       "      <th>v2ProductName</th>\n",
       "      <th>v2ProductCategory</th>\n",
       "      <th>productVariant</th>\n",
       "      <th>currencyCode</th>\n",
       "      <th>itemQuantity</th>\n",
       "      <th>itemRevenue</th>\n",
       "      <th>transactionRevenue</th>\n",
       "      <th>transactionId</th>\n",
       "      <th>pageTitle</th>\n",
       "      <th>searchKeyword</th>\n",
       "      <th>pagePathLevel1</th>\n",
       "      <th>eCommerceAction_type</th>\n",
       "      <th>eCommerceAction_step</th>\n",
       "      <th>eCommerceAction_option</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>140505387240227138</td>\n",
       "      <td>Direct</td>\n",
       "      <td>0</td>\n",
       "      <td>Ghana</td>\n",
       "      <td>not available in demo dataset</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>1</td>\n",
       "      <td>None</td>\n",
       "      <td>20161007</td>\n",
       "      <td>1475872666</td>\n",
       "      <td>PAGE</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>9990000</td>\n",
       "      <td>None</td>\n",
       "      <td>GGOEGDHG014499</td>\n",
       "      <td>Google Infuser-Top Water Bottle</td>\n",
       "      <td>Home/Drinkware/</td>\n",
       "      <td>(not set)</td>\n",
       "      <td>USD</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>Drinkware | Google Merchandise Store</td>\n",
       "      <td>None</td>\n",
       "      <td>/store.html</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>2161300302561027053</td>\n",
       "      <td>Organic Search</td>\n",
       "      <td>0</td>\n",
       "      <td>India</td>\n",
       "      <td>not available in demo dataset</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>1</td>\n",
       "      <td>None</td>\n",
       "      <td>20170521</td>\n",
       "      <td>1495355218</td>\n",
       "      <td>PAGE</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>99990000</td>\n",
       "      <td>None</td>\n",
       "      <td>GGOEGBRJ037299</td>\n",
       "      <td>Google Alpine Style Backpack</td>\n",
       "      <td>Home/Bags/</td>\n",
       "      <td>(not set)</td>\n",
       "      <td>USD</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>Bags | Google Merchandise Store</td>\n",
       "      <td>None</td>\n",
       "      <td>/google+redesign/</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>5809923035957342173</td>\n",
       "      <td>Direct</td>\n",
       "      <td>0</td>\n",
       "      <td>United States</td>\n",
       "      <td>not available in demo dataset</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>1</td>\n",
       "      <td>None</td>\n",
       "      <td>20161008</td>\n",
       "      <td>1475972048</td>\n",
       "      <td>PAGE</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>99990000</td>\n",
       "      <td>None</td>\n",
       "      <td>GGOEGBRA037499</td>\n",
       "      <td>Waterproof Backpack</td>\n",
       "      <td>Home/Bags/</td>\n",
       "      <td>(not set)</td>\n",
       "      <td>USD</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>Bags</td>\n",
       "      <td>None</td>\n",
       "      <td>/google+redesign/</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>8178337623496064877</td>\n",
       "      <td>Referral</td>\n",
       "      <td>0</td>\n",
       "      <td>United States</td>\n",
       "      <td>Mountain View</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>1</td>\n",
       "      <td>None</td>\n",
       "      <td>20160909</td>\n",
       "      <td>1473454898</td>\n",
       "      <td>PAGE</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>10990000</td>\n",
       "      <td>None</td>\n",
       "      <td>GGOEGCLB020832</td>\n",
       "      <td>Softsided Travel Pouch Set</td>\n",
       "      <td>Home/Bags/</td>\n",
       "      <td>(not set)</td>\n",
       "      <td>USD</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>Bags</td>\n",
       "      <td>None</td>\n",
       "      <td>/google+redesign/</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>9730509990517739822</td>\n",
       "      <td>Direct</td>\n",
       "      <td>0</td>\n",
       "      <td>Canada</td>\n",
       "      <td>Toronto</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>1</td>\n",
       "      <td>None</td>\n",
       "      <td>20160928</td>\n",
       "      <td>1475105477</td>\n",
       "      <td>PAGE</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>3500000</td>\n",
       "      <td>None</td>\n",
       "      <td>GGOEGBJR018199</td>\n",
       "      <td>Reusable Shopping Bag</td>\n",
       "      <td>Home/Bags/</td>\n",
       "      <td>(not set)</td>\n",
       "      <td>USD</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>Bags</td>\n",
       "      <td>None</td>\n",
       "      <td>/google+redesign/</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>5</td>\n",
       "      <td>9977935485234401557</td>\n",
       "      <td>Referral</td>\n",
       "      <td>0</td>\n",
       "      <td>United Kingdom</td>\n",
       "      <td>London</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>1</td>\n",
       "      <td>None</td>\n",
       "      <td>20160811</td>\n",
       "      <td>1470905998</td>\n",
       "      <td>PAGE</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>4990000</td>\n",
       "      <td>None</td>\n",
       "      <td>GGOEGOAA017199</td>\n",
       "      <td>Rubber Grip Ballpoint Pen 4 Pack</td>\n",
       "      <td>Home/Office/</td>\n",
       "      <td>(not set)</td>\n",
       "      <td>USD</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>Office</td>\n",
       "      <td>None</td>\n",
       "      <td>/google+redesign/</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>6</td>\n",
       "      <td>0064667731979082203</td>\n",
       "      <td>Organic Search</td>\n",
       "      <td>0</td>\n",
       "      <td>United States</td>\n",
       "      <td>New York</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>1</td>\n",
       "      <td>None</td>\n",
       "      <td>20161012</td>\n",
       "      <td>1476318035</td>\n",
       "      <td>PAGE</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>17990000</td>\n",
       "      <td>None</td>\n",
       "      <td>GGOEGOAB016099</td>\n",
       "      <td>Leather and Metal Ballpoint Pen</td>\n",
       "      <td>Home/Office/</td>\n",
       "      <td>(not set)</td>\n",
       "      <td>USD</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>Office | Google Merchandise Store</td>\n",
       "      <td>None</td>\n",
       "      <td>/google+redesign/</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "         fullVisitorId channelGrouping  time         country  \\\n",
       "0   140505387240227138          Direct     0           Ghana   \n",
       "1  2161300302561027053  Organic Search     0           India   \n",
       "2  5809923035957342173          Direct     0   United States   \n",
       "3  8178337623496064877        Referral     0   United States   \n",
       "4  9730509990517739822          Direct     0          Canada   \n",
       "5  9977935485234401557        Referral     0  United Kingdom   \n",
       "6  0064667731979082203  Organic Search     0   United States   \n",
       "\n",
       "                            city totalTransactionRevenue transactions  \\\n",
       "0  not available in demo dataset                    None         None   \n",
       "1  not available in demo dataset                    None         None   \n",
       "2  not available in demo dataset                    None         None   \n",
       "3                  Mountain View                    None         None   \n",
       "4                        Toronto                    None         None   \n",
       "5                         London                    None         None   \n",
       "6                       New York                    None         None   \n",
       "\n",
       "  timeOnSite  pageviews sessionQualityDim      date     visitId  type  \\\n",
       "0       None          1              None  20161007  1475872666  PAGE   \n",
       "1       None          1              None  20170521  1495355218  PAGE   \n",
       "2       None          1              None  20161008  1475972048  PAGE   \n",
       "3       None          1              None  20160909  1473454898  PAGE   \n",
       "4       None          1              None  20160928  1475105477  PAGE   \n",
       "5       None          1              None  20160811  1470905998  PAGE   \n",
       "6       None          1              None  20161012  1476318035  PAGE   \n",
       "\n",
       "  productRefundAmount productQuantity  productPrice productRevenue  \\\n",
       "0                None            None       9990000           None   \n",
       "1                None            None      99990000           None   \n",
       "2                None            None      99990000           None   \n",
       "3                None            None      10990000           None   \n",
       "4                None            None       3500000           None   \n",
       "5                None            None       4990000           None   \n",
       "6                None            None      17990000           None   \n",
       "\n",
       "       productSKU                     v2ProductName v2ProductCategory  \\\n",
       "0  GGOEGDHG014499   Google Infuser-Top Water Bottle   Home/Drinkware/   \n",
       "1  GGOEGBRJ037299      Google Alpine Style Backpack        Home/Bags/   \n",
       "2  GGOEGBRA037499               Waterproof Backpack        Home/Bags/   \n",
       "3  GGOEGCLB020832        Softsided Travel Pouch Set        Home/Bags/   \n",
       "4  GGOEGBJR018199             Reusable Shopping Bag        Home/Bags/   \n",
       "5  GGOEGOAA017199  Rubber Grip Ballpoint Pen 4 Pack      Home/Office/   \n",
       "6  GGOEGOAB016099   Leather and Metal Ballpoint Pen      Home/Office/   \n",
       "\n",
       "  productVariant currencyCode itemQuantity itemRevenue transactionRevenue  \\\n",
       "0      (not set)          USD         None        None               None   \n",
       "1      (not set)          USD         None        None               None   \n",
       "2      (not set)          USD         None        None               None   \n",
       "3      (not set)          USD         None        None               None   \n",
       "4      (not set)          USD         None        None               None   \n",
       "5      (not set)          USD         None        None               None   \n",
       "6      (not set)          USD         None        None               None   \n",
       "\n",
       "  transactionId                             pageTitle searchKeyword  \\\n",
       "0          None  Drinkware | Google Merchandise Store          None   \n",
       "1          None       Bags | Google Merchandise Store          None   \n",
       "2          None                                  Bags          None   \n",
       "3          None                                  Bags          None   \n",
       "4          None                                  Bags          None   \n",
       "5          None                                Office          None   \n",
       "6          None     Office | Google Merchandise Store          None   \n",
       "\n",
       "      pagePathLevel1 eCommerceAction_type  eCommerceAction_step  \\\n",
       "0        /store.html                    0                     1   \n",
       "1  /google+redesign/                    0                     1   \n",
       "2  /google+redesign/                    0                     1   \n",
       "3  /google+redesign/                    0                     1   \n",
       "4  /google+redesign/                    0                     1   \n",
       "5  /google+redesign/                    0                     1   \n",
       "6  /google+redesign/                    0                     1   \n",
       "\n",
       "  eCommerceAction_option  \n",
       "0                   None  \n",
       "1                   None  \n",
       "2                   None  \n",
       "3                   None  \n",
       "4                   None  \n",
       "5                   None  \n",
       "6                   None  "
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%bigquery --project $PROJECT \n",
    "#standardSQL\n",
    "SELECT *\n",
    "FROM `data-to-insights.ecommerce.all_sessions_raw`\n",
    "LIMIT 7"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Identify duplicate rows\n",
    "\n",
    "Seeing a sample amount of data may give you greater intuition for what is included in the dataset. But since the table is quite large, a preview is not likely to render meaningful results.  As you scan and scroll through the sample rows you see there is no singular field that uniquely identifies a row, so you need advanced logic to identify duplicate rows.\n",
    "\n",
    "The query below uses the SQL GROUP BY function on every field and counts (COUNT) where there are rows that have the same values across every field.\n",
    "\n",
    "If every field is unique, the COUNT will return 1 as there are no other groupings of rows with the exact same value for all fields.\n",
    "If there is a row with the same values for all fields, they will be grouped together and the COUNT will be greater than 1. The last part of the query is an aggregation filter using HAVING to only show the results that have a COUNT of duplicates greater than 1.\n",
    "Run the following query to find duplicate records across all columns."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "TODO 3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>num_duplicate_rows</th>\n",
       "      <th>fullVisitorId</th>\n",
       "      <th>channelGrouping</th>\n",
       "      <th>time</th>\n",
       "      <th>country</th>\n",
       "      <th>city</th>\n",
       "      <th>totalTransactionRevenue</th>\n",
       "      <th>transactions</th>\n",
       "      <th>timeOnSite</th>\n",
       "      <th>pageviews</th>\n",
       "      <th>sessionQualityDim</th>\n",
       "      <th>date</th>\n",
       "      <th>visitId</th>\n",
       "      <th>type</th>\n",
       "      <th>productRefundAmount</th>\n",
       "      <th>productQuantity</th>\n",
       "      <th>productPrice</th>\n",
       "      <th>productRevenue</th>\n",
       "      <th>productSKU</th>\n",
       "      <th>v2ProductName</th>\n",
       "      <th>v2ProductCategory</th>\n",
       "      <th>productVariant</th>\n",
       "      <th>currencyCode</th>\n",
       "      <th>itemQuantity</th>\n",
       "      <th>itemRevenue</th>\n",
       "      <th>transactionRevenue</th>\n",
       "      <th>transactionId</th>\n",
       "      <th>pageTitle</th>\n",
       "      <th>searchKeyword</th>\n",
       "      <th>pagePathLevel1</th>\n",
       "      <th>eCommerceAction_type</th>\n",
       "      <th>eCommerceAction_step</th>\n",
       "      <th>eCommerceAction_option</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>4890832126611809362</td>\n",
       "      <td>Organic Search</td>\n",
       "      <td>0</td>\n",
       "      <td>Sri Lanka</td>\n",
       "      <td>not available in demo dataset</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>20170610</td>\n",
       "      <td>1497087233</td>\n",
       "      <td>PAGE</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>20990000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>GGOEGAAX0356</td>\n",
       "      <td>YouTube Men's Vintage Tank</td>\n",
       "      <td>Home/Shop by Brand/YouTube/</td>\n",
       "      <td>(not set)</td>\n",
       "      <td>USD</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "      <td>YouTube | Shop by Brand | Google Merchandise S...</td>\n",
       "      <td>None</td>\n",
       "      <td>/google+redesign/</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>477997596462135678</td>\n",
       "      <td>Direct</td>\n",
       "      <td>653506</td>\n",
       "      <td>United States</td>\n",
       "      <td>San Jose</td>\n",
       "      <td>246000000.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>931.0</td>\n",
       "      <td>36</td>\n",
       "      <td>NaN</td>\n",
       "      <td>20170118</td>\n",
       "      <td>1484796890</td>\n",
       "      <td>PAGE</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>119000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>GGOENEBQ078999</td>\n",
       "      <td>Nest® Cam Outdoor Security Camera - USA</td>\n",
       "      <td>Nest-USA</td>\n",
       "      <td>Single Option Only</td>\n",
       "      <td>USD</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>ORD201701182177</td>\n",
       "      <td>Checkout Confirmation</td>\n",
       "      <td>None</td>\n",
       "      <td>/ordercompleted.html</td>\n",
       "      <td>6</td>\n",
       "      <td>1</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>4890832126611809362</td>\n",
       "      <td>Organic Search</td>\n",
       "      <td>0</td>\n",
       "      <td>Sri Lanka</td>\n",
       "      <td>not available in demo dataset</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>2</td>\n",
       "      <td>NaN</td>\n",
       "      <td>20170610</td>\n",
       "      <td>1497087233</td>\n",
       "      <td>PAGE</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>24990000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>GGOEYHPA003610</td>\n",
       "      <td>YouTube Wool Heather Cap Heather/Black</td>\n",
       "      <td>Home/Shop by Brand/YouTube/</td>\n",
       "      <td>(not set)</td>\n",
       "      <td>USD</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "      <td>YouTube | Shop by Brand | Google Merchandise S...</td>\n",
       "      <td>None</td>\n",
       "      <td>/google+redesign/</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>3211801082901013750</td>\n",
       "      <td>Referral</td>\n",
       "      <td>417832</td>\n",
       "      <td>United States</td>\n",
       "      <td>New York</td>\n",
       "      <td>409000000.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>418.0</td>\n",
       "      <td>23</td>\n",
       "      <td>NaN</td>\n",
       "      <td>20170601</td>\n",
       "      <td>1496375475</td>\n",
       "      <td>PAGE</td>\n",
       "      <td>None</td>\n",
       "      <td>1.0</td>\n",
       "      <td>100000000</td>\n",
       "      <td>102250000.0</td>\n",
       "      <td>GGOEGGCX056199</td>\n",
       "      <td>Gift Card- $100.00</td>\n",
       "      <td>Gift Cards</td>\n",
       "      <td>Single Option Only</td>\n",
       "      <td>USD</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>409000000.0</td>\n",
       "      <td>ORD201706013038</td>\n",
       "      <td>Checkout Confirmation</td>\n",
       "      <td>None</td>\n",
       "      <td>/ordercompleted.html</td>\n",
       "      <td>6</td>\n",
       "      <td>1</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>2</td>\n",
       "      <td>8306703617933176158</td>\n",
       "      <td>Organic Search</td>\n",
       "      <td>559533</td>\n",
       "      <td>Canada</td>\n",
       "      <td>not available in demo dataset</td>\n",
       "      <td>151000000.0</td>\n",
       "      <td>1.0</td>\n",
       "      <td>567.0</td>\n",
       "      <td>18</td>\n",
       "      <td>NaN</td>\n",
       "      <td>20170511</td>\n",
       "      <td>1494505602</td>\n",
       "      <td>PAGE</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>149000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>GGOENEBB081499</td>\n",
       "      <td>Nest® Cam Indoor Security Camera - CA</td>\n",
       "      <td>Nest-Canada</td>\n",
       "      <td>Single Option Only</td>\n",
       "      <td>USD</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>ORD201705112202</td>\n",
       "      <td>Checkout Confirmation</td>\n",
       "      <td>None</td>\n",
       "      <td>/ordercompleted.html</td>\n",
       "      <td>6</td>\n",
       "      <td>1</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>610</td>\n",
       "      <td>2</td>\n",
       "      <td>6706553303219862080</td>\n",
       "      <td>Organic Search</td>\n",
       "      <td>78319</td>\n",
       "      <td>Australia</td>\n",
       "      <td>Melbourne</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>88.0</td>\n",
       "      <td>8</td>\n",
       "      <td>NaN</td>\n",
       "      <td>20170301</td>\n",
       "      <td>1488431352</td>\n",
       "      <td>PAGE</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>4990000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>GGOEYDHJ056099</td>\n",
       "      <td>22 oz YouTube Bottle Infuser</td>\n",
       "      <td>Home/Shop by Brand/YouTube/</td>\n",
       "      <td>(not set)</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "      <td>YouTube | Shop by Brand | Google Merchandise S...</td>\n",
       "      <td>None</td>\n",
       "      <td>/google+redesign/</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>611</td>\n",
       "      <td>4</td>\n",
       "      <td>9758235511148216157</td>\n",
       "      <td>Direct</td>\n",
       "      <td>682296</td>\n",
       "      <td>United States</td>\n",
       "      <td>Sunnyvale</td>\n",
       "      <td>134000000.0</td>\n",
       "      <td>2.0</td>\n",
       "      <td>705.0</td>\n",
       "      <td>22</td>\n",
       "      <td>NaN</td>\n",
       "      <td>20170615</td>\n",
       "      <td>1497575168</td>\n",
       "      <td>PAGE</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>25000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>GGOEGGCX056299</td>\n",
       "      <td>Gift Card - $25.00</td>\n",
       "      <td>Gift Cards</td>\n",
       "      <td>Single Option Only</td>\n",
       "      <td>USD</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "      <td>Payment Method</td>\n",
       "      <td>None</td>\n",
       "      <td>/payment.html</td>\n",
       "      <td>5</td>\n",
       "      <td>2</td>\n",
       "      <td>Payment</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>612</td>\n",
       "      <td>3</td>\n",
       "      <td>1915538933685278364</td>\n",
       "      <td>Referral</td>\n",
       "      <td>487234</td>\n",
       "      <td>United States</td>\n",
       "      <td>San Jose</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>638.0</td>\n",
       "      <td>22</td>\n",
       "      <td>NaN</td>\n",
       "      <td>20161221</td>\n",
       "      <td>1482356339</td>\n",
       "      <td>PAGE</td>\n",
       "      <td>None</td>\n",
       "      <td>1.0</td>\n",
       "      <td>100000000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>GGOEGGCX056199</td>\n",
       "      <td>Gift Card- $100.00</td>\n",
       "      <td>Gift Cards</td>\n",
       "      <td>Single Option Only</td>\n",
       "      <td>USD</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "      <td>Checkout Your Information</td>\n",
       "      <td>None</td>\n",
       "      <td>/yourinfo.html</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>Billing and Shipping</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>613</td>\n",
       "      <td>2</td>\n",
       "      <td>8368489856222393707</td>\n",
       "      <td>Direct</td>\n",
       "      <td>331079</td>\n",
       "      <td>United States</td>\n",
       "      <td>Mountain View</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>423.0</td>\n",
       "      <td>13</td>\n",
       "      <td>NaN</td>\n",
       "      <td>20170216</td>\n",
       "      <td>1487286356</td>\n",
       "      <td>EVENT</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>16990000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>GGOEGAAX0104</td>\n",
       "      <td>Google Men's 100% Cotton Short Sleeve Hero Tee...</td>\n",
       "      <td>Home/Apparel/Men's/</td>\n",
       "      <td>(not set)</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "      <td>Men's Apparel | Google Merchandise Store</td>\n",
       "      <td>None</td>\n",
       "      <td>/google+redesign/</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>614</td>\n",
       "      <td>2</td>\n",
       "      <td>917551604501805376</td>\n",
       "      <td>Direct</td>\n",
       "      <td>86855</td>\n",
       "      <td>Russia</td>\n",
       "      <td>not available in demo dataset</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>145.0</td>\n",
       "      <td>53</td>\n",
       "      <td>NaN</td>\n",
       "      <td>20170428</td>\n",
       "      <td>1493445465</td>\n",
       "      <td>EVENT</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>24990000</td>\n",
       "      <td>NaN</td>\n",
       "      <td>GGOEAHPA004110</td>\n",
       "      <td>Android Wool Heather Cap Heather/Black</td>\n",
       "      <td>Home/Shop by Brand/Android/</td>\n",
       "      <td>(not set)</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>None</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "      <td>Android | Shop by Brand | Google Merchandise S...</td>\n",
       "      <td>None</td>\n",
       "      <td>/store.html</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>615 rows × 33 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     num_duplicate_rows        fullVisitorId channelGrouping    time  \\\n",
       "0                     2  4890832126611809362  Organic Search       0   \n",
       "1                     2   477997596462135678          Direct  653506   \n",
       "2                     2  4890832126611809362  Organic Search       0   \n",
       "3                     4  3211801082901013750        Referral  417832   \n",
       "4                     2  8306703617933176158  Organic Search  559533   \n",
       "..                  ...                  ...             ...     ...   \n",
       "610                   2  6706553303219862080  Organic Search   78319   \n",
       "611                   4  9758235511148216157          Direct  682296   \n",
       "612                   3  1915538933685278364        Referral  487234   \n",
       "613                   2  8368489856222393707          Direct  331079   \n",
       "614                   2   917551604501805376          Direct   86855   \n",
       "\n",
       "           country                           city  totalTransactionRevenue  \\\n",
       "0        Sri Lanka  not available in demo dataset                      NaN   \n",
       "1    United States                       San Jose              246000000.0   \n",
       "2        Sri Lanka  not available in demo dataset                      NaN   \n",
       "3    United States                       New York              409000000.0   \n",
       "4           Canada  not available in demo dataset              151000000.0   \n",
       "..             ...                            ...                      ...   \n",
       "610      Australia                      Melbourne                      NaN   \n",
       "611  United States                      Sunnyvale              134000000.0   \n",
       "612  United States                       San Jose                      NaN   \n",
       "613  United States                  Mountain View                      NaN   \n",
       "614         Russia  not available in demo dataset                      NaN   \n",
       "\n",
       "     transactions  timeOnSite  pageviews  sessionQualityDim      date  \\\n",
       "0             NaN         NaN          2                NaN  20170610   \n",
       "1             1.0       931.0         36                NaN  20170118   \n",
       "2             NaN         NaN          2                NaN  20170610   \n",
       "3             1.0       418.0         23                NaN  20170601   \n",
       "4             1.0       567.0         18                NaN  20170511   \n",
       "..            ...         ...        ...                ...       ...   \n",
       "610           NaN        88.0          8                NaN  20170301   \n",
       "611           2.0       705.0         22                NaN  20170615   \n",
       "612           NaN       638.0         22                NaN  20161221   \n",
       "613           NaN       423.0         13                NaN  20170216   \n",
       "614           NaN       145.0         53                NaN  20170428   \n",
       "\n",
       "        visitId   type productRefundAmount  productQuantity  productPrice  \\\n",
       "0    1497087233   PAGE                None              NaN      20990000   \n",
       "1    1484796890   PAGE                None              NaN     119000000   \n",
       "2    1497087233   PAGE                None              NaN      24990000   \n",
       "3    1496375475   PAGE                None              1.0     100000000   \n",
       "4    1494505602   PAGE                None              NaN     149000000   \n",
       "..          ...    ...                 ...              ...           ...   \n",
       "610  1488431352   PAGE                None              NaN       4990000   \n",
       "611  1497575168   PAGE                None              NaN      25000000   \n",
       "612  1482356339   PAGE                None              1.0     100000000   \n",
       "613  1487286356  EVENT                None              NaN      16990000   \n",
       "614  1493445465  EVENT                None              NaN      24990000   \n",
       "\n",
       "     productRevenue      productSKU  \\\n",
       "0               NaN    GGOEGAAX0356   \n",
       "1               NaN  GGOENEBQ078999   \n",
       "2               NaN  GGOEYHPA003610   \n",
       "3       102250000.0  GGOEGGCX056199   \n",
       "4               NaN  GGOENEBB081499   \n",
       "..              ...             ...   \n",
       "610             NaN  GGOEYDHJ056099   \n",
       "611             NaN  GGOEGGCX056299   \n",
       "612             NaN  GGOEGGCX056199   \n",
       "613             NaN    GGOEGAAX0104   \n",
       "614             NaN  GGOEAHPA004110   \n",
       "\n",
       "                                         v2ProductName  \\\n",
       "0                           YouTube Men's Vintage Tank   \n",
       "1              Nest® Cam Outdoor Security Camera - USA   \n",
       "2               YouTube Wool Heather Cap Heather/Black   \n",
       "3                                   Gift Card- $100.00   \n",
       "4                Nest® Cam Indoor Security Camera - CA   \n",
       "..                                                 ...   \n",
       "610                       22 oz YouTube Bottle Infuser   \n",
       "611                                 Gift Card - $25.00   \n",
       "612                                 Gift Card- $100.00   \n",
       "613  Google Men's 100% Cotton Short Sleeve Hero Tee...   \n",
       "614             Android Wool Heather Cap Heather/Black   \n",
       "\n",
       "               v2ProductCategory      productVariant currencyCode  \\\n",
       "0    Home/Shop by Brand/YouTube/           (not set)          USD   \n",
       "1                       Nest-USA  Single Option Only          USD   \n",
       "2    Home/Shop by Brand/YouTube/           (not set)          USD   \n",
       "3                     Gift Cards  Single Option Only          USD   \n",
       "4                    Nest-Canada  Single Option Only          USD   \n",
       "..                           ...                 ...          ...   \n",
       "610  Home/Shop by Brand/YouTube/           (not set)         None   \n",
       "611                   Gift Cards  Single Option Only          USD   \n",
       "612                   Gift Cards  Single Option Only          USD   \n",
       "613          Home/Apparel/Men's/           (not set)         None   \n",
       "614  Home/Shop by Brand/Android/           (not set)         None   \n",
       "\n",
       "    itemQuantity itemRevenue  transactionRevenue    transactionId  \\\n",
       "0           None        None                 NaN             None   \n",
       "1           None        None                 NaN  ORD201701182177   \n",
       "2           None        None                 NaN             None   \n",
       "3           None        None         409000000.0  ORD201706013038   \n",
       "4           None        None                 NaN  ORD201705112202   \n",
       "..           ...         ...                 ...              ...   \n",
       "610         None        None                 NaN             None   \n",
       "611         None        None                 NaN             None   \n",
       "612         None        None                 NaN             None   \n",
       "613         None        None                 NaN             None   \n",
       "614         None        None                 NaN             None   \n",
       "\n",
       "                                             pageTitle searchKeyword  \\\n",
       "0    YouTube | Shop by Brand | Google Merchandise S...          None   \n",
       "1                                Checkout Confirmation          None   \n",
       "2    YouTube | Shop by Brand | Google Merchandise S...          None   \n",
       "3                                Checkout Confirmation          None   \n",
       "4                                Checkout Confirmation          None   \n",
       "..                                                 ...           ...   \n",
       "610  YouTube | Shop by Brand | Google Merchandise S...          None   \n",
       "611                                     Payment Method          None   \n",
       "612                          Checkout Your Information          None   \n",
       "613           Men's Apparel | Google Merchandise Store          None   \n",
       "614  Android | Shop by Brand | Google Merchandise S...          None   \n",
       "\n",
       "           pagePathLevel1 eCommerceAction_type  eCommerceAction_step  \\\n",
       "0       /google+redesign/                    0                     1   \n",
       "1    /ordercompleted.html                    6                     1   \n",
       "2       /google+redesign/                    0                     1   \n",
       "3    /ordercompleted.html                    6                     1   \n",
       "4    /ordercompleted.html                    6                     1   \n",
       "..                    ...                  ...                   ...   \n",
       "610     /google+redesign/                    2                     1   \n",
       "611         /payment.html                    5                     2   \n",
       "612        /yourinfo.html                    5                     1   \n",
       "613     /google+redesign/                    1                     1   \n",
       "614           /store.html                    1                     1   \n",
       "\n",
       "    eCommerceAction_option  \n",
       "0                     None  \n",
       "1                     None  \n",
       "2                     None  \n",
       "3                     None  \n",
       "4                     None  \n",
       "..                     ...  \n",
       "610                   None  \n",
       "611                Payment  \n",
       "612   Billing and Shipping  \n",
       "613                   None  \n",
       "614                   None  \n",
       "\n",
       "[615 rows x 33 columns]"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%bigquery --project $PROJECT \n",
    "#standardSQL\n",
    "SELECT count(*) AS num_duplicate_rows, \n",
    "       * \n",
    "FROM   `data-to-insights.ecommerce.all_sessions_raw` \n",
    "GROUP BY fullvisitorid, \n",
    "          channelgrouping, \n",
    "          time, \n",
    "          country, \n",
    "          city, \n",
    "          totaltransactionrevenue, \n",
    "          transactions, \n",
    "          timeonsite, \n",
    "          pageviews, \n",
    "          sessionqualitydim, \n",
    "          date, \n",
    "          visitid, \n",
    "          type, \n",
    "          productrefundamount, \n",
    "          productquantity, \n",
    "          productprice, \n",
    "          productrevenue, \n",
    "          productsku, \n",
    "          v2productname, \n",
    "          v2productcategory, \n",
    "          productvariant, \n",
    "          currencycode, \n",
    "          itemquantity, \n",
    "          itemrevenue, \n",
    "          transactionrevenue, \n",
    "          transactionid, \n",
    "          pagetitle, \n",
    "          searchkeyword, \n",
    "          pagepathlevel1, \n",
    "          ecommerceaction_type, \n",
    "          ecommerceaction_step, \n",
    "          ecommerceaction_option \n",
    "HAVING num_duplicate_rows > 1; "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you can see there are quite a few \"duplicate\" records (615) when analyzed with these parameters.\n",
    "\n",
    "In your own datasets, even if you have a unique key, it is still beneficial to confirm the uniqueness of the rows with COUNT, GROUP BY, and HAVING before you begin your analysis."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Analyze the new all_sessions table\n",
    "\n",
    "In this section you use a deduplicated table called all_sessions.\n",
    "\n",
    "Scenario: Your data analyst team has provided you with a relevant query, and your schema experts have identified the key fields that must be unique for each record per your schema.\n",
    "\n",
    "Run the query to confirm that no duplicates exist, this time against the \"all_sessions\" table:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>fullvisitorid</th>\n",
       "      <th>visitid</th>\n",
       "      <th>date</th>\n",
       "      <th>time</th>\n",
       "      <th>v2productname</th>\n",
       "      <th>productsku</th>\n",
       "      <th>type</th>\n",
       "      <th>ecommerceaction_type</th>\n",
       "      <th>ecommerceaction_step</th>\n",
       "      <th>ecommerceaction_option</th>\n",
       "      <th>transactionrevenue</th>\n",
       "      <th>transactionid</th>\n",
       "      <th>row_count</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Empty DataFrame\n",
       "Columns: [fullvisitorid, visitid, date, time, v2productname, productsku, type, ecommerceaction_type, ecommerceaction_step, ecommerceaction_option, transactionrevenue, transactionid, row_count]\n",
       "Index: []"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%bigquery --project $PROJECT\n",
    "#standardSQL\n",
    "SELECT fullvisitorid, # the unique visitor ID \n",
    "       visitid, # a visitor can have multiple visits \n",
    "       date, # session date stored as string YYYYMMDD \n",
    "       time, # time of the individual site hit  (can be 0 or more) \n",
    "       v2productname, # not unique since a product can have variants like Color \n",
    "       productsku, # unique for each product \n",
    "       type, # visit and/or event trigger \n",
    "       ecommerceaction_type, # maps to ‘add to cart', ‘completed checkout' \n",
    "       ecommerceaction_step, \n",
    "       ecommerceaction_option, \n",
    "       transactionrevenue, # revenue of the order \n",
    "       transactionid, # unique identifier for revenue bearing transaction \n",
    "       count(*) AS row_count \n",
    "FROM   `data-to-insights.ecommerce.all_sessions` \n",
    "GROUP  BY 1, \n",
    "          2, \n",
    "          3, \n",
    "          4, \n",
    "          5, \n",
    "          6, \n",
    "          7, \n",
    "          8, \n",
    "          9, \n",
    "          10, \n",
    "          11, \n",
    "          12 \n",
    "HAVING row_count > 1 # find duplicates \n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The query returns zero records indicating no duplicates exist."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Write basic SQL against the eCommerce data (TODO 4)\n",
    "\n",
    "In this section, you query for insights on the ecommerce dataset.\n",
    "\n",
    "A good first path of analysis is to find the total unique visitors\n",
    "The query below determines the total views by counting product_views and the number of unique visitors by counting fullVisitorID."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>product_views</th>\n",
       "      <th>unique_visitors</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>21493109</td>\n",
       "      <td>389934</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   product_views  unique_visitors\n",
       "0       21493109           389934"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%bigquery --project $PROJECT\n",
    "#standardSQL\n",
    "SELECT count(*)                      AS product_views, \n",
    "       count(DISTINCT fullvisitorid) AS unique_visitors \n",
    "FROM   `data-to-insights.ecommerce.all_sessions`; "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The next query shows total unique visitors(fullVisitorID) by the referring site (channelGrouping):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>unique_visitors</th>\n",
       "      <th>channelgrouping</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>38101</td>\n",
       "      <td>Social</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>57308</td>\n",
       "      <td>Referral</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>11865</td>\n",
       "      <td>Paid Search</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>211993</td>\n",
       "      <td>Organic Search</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>3067</td>\n",
       "      <td>Display</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>5</td>\n",
       "      <td>75688</td>\n",
       "      <td>Direct</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>6</td>\n",
       "      <td>5966</td>\n",
       "      <td>Affiliates</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>7</td>\n",
       "      <td>62</td>\n",
       "      <td>(Other)</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   unique_visitors channelgrouping\n",
       "0            38101          Social\n",
       "1            57308        Referral\n",
       "2            11865     Paid Search\n",
       "3           211993  Organic Search\n",
       "4             3067         Display\n",
       "5            75688          Direct\n",
       "6             5966      Affiliates\n",
       "7               62         (Other)"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%bigquery --project $PROJECT\n",
    "#standardSQL\n",
    "SELECT count(DISTINCT fullvisitorid) AS unique_visitors, \n",
    "       channelgrouping \n",
    "FROM   `data-to-insights.ecommerce.all_sessions` \n",
    "GROUP  BY 2 \n",
    "ORDER  BY 2 DESC;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To find deeper insights in the data, the next query lists the five products with the most views (product_views) from unique visitors. The query counts number of times a product (v2ProductName) was viewed (product_views), puts the list in descending order, and lists the top 5 entries:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>product_views</th>\n",
       "      <th>ProductName</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>316482</td>\n",
       "      <td>Google Men's 100% Cotton Short Sleeve Hero Tee...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>221558</td>\n",
       "      <td>22 oz YouTube Bottle Infuser</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>210700</td>\n",
       "      <td>YouTube Men's Short Sleeve Hero Tee Black</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>202205</td>\n",
       "      <td>Google Men's 100% Cotton Short Sleeve Hero Tee...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>200789</td>\n",
       "      <td>YouTube Custom Decals</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   product_views                                        ProductName\n",
       "0         316482  Google Men's 100% Cotton Short Sleeve Hero Tee...\n",
       "1         221558                       22 oz YouTube Bottle Infuser\n",
       "2         210700          YouTube Men's Short Sleeve Hero Tee Black\n",
       "3         202205  Google Men's 100% Cotton Short Sleeve Hero Tee...\n",
       "4         200789                              YouTube Custom Decals"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%bigquery --project $PROJECT\n",
    "#standardSQL\n",
    "SELECT count(*)          AS product_views, \n",
    "       ( v2productname ) AS ProductName \n",
    "FROM   `data-to-insights.ecommerce.all_sessions` \n",
    "WHERE  type = 'PAGE' \n",
    "GROUP  BY v2productname \n",
    "ORDER  BY product_views DESC \n",
    "LIMIT  5;"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now expand your previous query to include the total number of distinct products ordered and the total number of total units ordered (productQuantity):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>product_views</th>\n",
       "      <th>orders</th>\n",
       "      <th>quantity_product_ordered</th>\n",
       "      <th>v2productname</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>316482</td>\n",
       "      <td>3158</td>\n",
       "      <td>6352</td>\n",
       "      <td>Google Men's 100% Cotton Short Sleeve Hero Tee...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>221558</td>\n",
       "      <td>508</td>\n",
       "      <td>4769</td>\n",
       "      <td>22 oz YouTube Bottle Infuser</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>210700</td>\n",
       "      <td>949</td>\n",
       "      <td>1114</td>\n",
       "      <td>YouTube Men's Short Sleeve Hero Tee Black</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>202205</td>\n",
       "      <td>2713</td>\n",
       "      <td>8072</td>\n",
       "      <td>Google Men's 100% Cotton Short Sleeve Hero Tee...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>200789</td>\n",
       "      <td>1703</td>\n",
       "      <td>11336</td>\n",
       "      <td>YouTube Custom Decals</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   product_views  orders  quantity_product_ordered  \\\n",
       "0         316482    3158                      6352   \n",
       "1         221558     508                      4769   \n",
       "2         210700     949                      1114   \n",
       "3         202205    2713                      8072   \n",
       "4         200789    1703                     11336   \n",
       "\n",
       "                                       v2productname  \n",
       "0  Google Men's 100% Cotton Short Sleeve Hero Tee...  \n",
       "1                       22 oz YouTube Bottle Infuser  \n",
       "2          YouTube Men's Short Sleeve Hero Tee Black  \n",
       "3  Google Men's 100% Cotton Short Sleeve Hero Tee...  \n",
       "4                              YouTube Custom Decals  "
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%bigquery --project $PROJECT\n",
    "#standardSQL\n",
    "SELECT count(*)               AS product_views, \n",
    "       count(productquantity) AS orders, \n",
    "       sum(productquantity)   AS quantity_product_ordered, \n",
    "       v2productname \n",
    "FROM   `data-to-insights.ecommerce.all_sessions` \n",
    "WHERE  type = 'PAGE' \n",
    "GROUP  BY v2productname \n",
    "ORDER  BY product_views DESC \n",
    "LIMIT  5; "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Lastly, expand the query to include the average amount of product per order (total number of units ordered/total number of orders, or `SUM(productQuantity)/COUNT(productQuantity)`)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>product_views</th>\n",
       "      <th>orders</th>\n",
       "      <th>quantity_product_ordered</th>\n",
       "      <th>avg_per_order</th>\n",
       "      <th>productName</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <td>0</td>\n",
       "      <td>316482</td>\n",
       "      <td>3158</td>\n",
       "      <td>6352</td>\n",
       "      <td>2.011400</td>\n",
       "      <td>Google Men's 100% Cotton Short Sleeve Hero Tee...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>1</td>\n",
       "      <td>221558</td>\n",
       "      <td>508</td>\n",
       "      <td>4769</td>\n",
       "      <td>9.387795</td>\n",
       "      <td>22 oz YouTube Bottle Infuser</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>2</td>\n",
       "      <td>210700</td>\n",
       "      <td>949</td>\n",
       "      <td>1114</td>\n",
       "      <td>1.173867</td>\n",
       "      <td>YouTube Men's Short Sleeve Hero Tee Black</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>3</td>\n",
       "      <td>202205</td>\n",
       "      <td>2713</td>\n",
       "      <td>8072</td>\n",
       "      <td>2.975304</td>\n",
       "      <td>Google Men's 100% Cotton Short Sleeve Hero Tee...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <td>4</td>\n",
       "      <td>200789</td>\n",
       "      <td>1703</td>\n",
       "      <td>11336</td>\n",
       "      <td>6.656489</td>\n",
       "      <td>YouTube Custom Decals</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   product_views  orders  quantity_product_ordered  avg_per_order  \\\n",
       "0         316482    3158                      6352       2.011400   \n",
       "1         221558     508                      4769       9.387795   \n",
       "2         210700     949                      1114       1.173867   \n",
       "3         202205    2713                      8072       2.975304   \n",
       "4         200789    1703                     11336       6.656489   \n",
       "\n",
       "                                         productName  \n",
       "0  Google Men's 100% Cotton Short Sleeve Hero Tee...  \n",
       "1                       22 oz YouTube Bottle Infuser  \n",
       "2          YouTube Men's Short Sleeve Hero Tee Black  \n",
       "3  Google Men's 100% Cotton Short Sleeve Hero Tee...  \n",
       "4                              YouTube Custom Decals  "
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%%bigquery --project $PROJECT\n",
    "#standardSQL\n",
    "SELECT count(*)                                      AS product_views, \n",
    "       count(productquantity)                        AS orders, \n",
    "       sum(productquantity)                          AS quantity_product_ordered, \n",
    "       sum(productquantity) / Count(productquantity) AS avg_per_order, \n",
    "       v2productname                                 AS productName \n",
    "FROM   `data-to-insights.ecommerce.all_sessions` \n",
    "WHERE  type = 'PAGE' \n",
    "GROUP  BY v2productname \n",
    "ORDER  BY product_views DESC \n",
    "LIMIT  5; "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can see that among these top 5 products by product views that the 22 oz YouTube Bottle Infuser had the highest avg_per_order with 9.38 units per order."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You have completed this lab exercise.   In this situation the \"all_sessions\" was provided to you with the deduplicated records.  In the course of your own future analysis you may have to create this on your own using BigQuery and the `create table DATASET.TABLE2 as select * from DATASET.TABLE1` syntax."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Copyright 2019 Google Inc.\n",
    "Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at\n",
    "http://www.apache.org/licenses/LICENSE-2.0\n",
    "Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
